这项工作介绍了一种新方法,以考虑文本分析中的主观性和一般上下文依赖性,并用作示例检测文本中传达的情绪。所提出的方法通过Marvin Minsky(1974)利用Mikolov等人的文本向量化的框架理论的计算版本来考虑主观性。 (2013),用于基于它们出现的上下文生成单词的分布式表示。我们的方法是基于三个组成部分:1。代表观点的框架/“房间”; 2.代表分析标准的基准 - 在这种情况下,情绪分类,从罗伯特·普特金(1980)的人类情绪研究; 3.要分析的文件。通过使用单词之间的相似性测量,我们能够在我们的案例研究中提取基准中的元素中的元素的相对相关性 - 对于要分析的文件。我们的方法提供了一种措施,考虑到读取文档的实体的角度。该方法可以应用于评估主体性与理解文本的相对值或含义相关的所有情况。主观性可以不限于人体反应,但它可用于提供具有与给定域(“房间”)相关的解释的文本。为了评估我们的方法,我们在政治领域中使用了测试案例。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Text classification is a natural language processing (NLP) task relevant to many commercial applications, like e-commerce and customer service. Naturally, classifying such excerpts accurately often represents a challenge, due to intrinsic language aspects, like irony and nuance. To accomplish this task, one must provide a robust numerical representation for documents, a process known as embedding. Embedding represents a key NLP field nowadays, having faced a significant advance in the last decade, especially after the introduction of the word-to-vector concept and the popularization of Deep Learning models for solving NLP tasks, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformer-based Language Models (TLMs). Despite the impressive achievements in this field, the literature coverage regarding generating embeddings for Brazilian Portuguese texts is scarce, especially when considering commercial user reviews. Therefore, this work aims to provide a comprehensive experimental study of embedding approaches targeting a binary sentiment classification of user reviews in Brazilian Portuguese. This study includes from classical (Bag-of-Words) to state-of-the-art (Transformer-based) NLP models. The methods are evaluated with five open-source databases with pre-defined data partitions made available in an open digital repository to encourage reproducibility. The Fine-tuned TLMs achieved the best results for all cases, being followed by the Feature-based TLM, LSTM, and CNN, with alternate ranks, depending on the database under analysis.
translated by 谷歌翻译
Chronic pain is a multi-dimensional experience, and pain intensity plays an important part, impacting the patients emotional balance, psychology, and behaviour. Standard self-reporting tools, such as the Visual Analogue Scale for pain, fail to capture this burden. Moreover, this type of tools is susceptible to a degree of subjectivity, dependent on the patients clear understanding of how to use it, social biases, and their ability to translate a complex experience to a scale. To overcome these and other self-reporting challenges, pain intensity estimation has been previously studied based on facial expressions, electroencephalograms, brain imaging, and autonomic features. However, to the best of our knowledge, it has never been attempted to base this estimation on the patient narratives of the personal experience of chronic pain, which is what we propose in this work. Indeed, in the clinical assessment and management of chronic pain, verbal communication is essential to convey information to physicians that would otherwise not be easily accessible through standard reporting tools, since language, sociocultural, and psychosocial variables are intertwined. We show that language features from patient narratives indeed convey information relevant for pain intensity estimation, and that our computational models can take advantage of that. Specifically, our results show that patients with mild pain focus more on the use of verbs, whilst moderate and severe pain patients focus on adverbs, and nouns and adjectives, respectively, and that these differences allow for the distinction between these three pain classes.
translated by 谷歌翻译
最近证明利用稀疏网络连接深神经网络中的连续层,可为大型最新模型提供好处。但是,网络连接性在浅网络的学习曲线中也起着重要作用,例如经典限制的玻尔兹曼机器(RBM)。一个基本问题是有效地找到了改善学习曲线的连接模式。最近的原则方法明确将网络连接作为参数,这些参数必须在模型中进行优化,但通常依靠连续功能来表示连接和明确的惩罚。这项工作提出了一种基于网络梯度的想法来找到RBM的最佳连接模式的方法:计算每个可能连接的梯度,给定特定的连接模式,并使用梯度驱动连续连接强度参数又使用确定连接模式。因此,学习RBM参数和学习网络连接是真正共同执行的,尽管学习率不同,并且没有改变目标函数。该方法应用于MNIST数据集,以显示针对样本生成和输入分类的基准任务找到更好的RBM模型。
translated by 谷歌翻译
对心脏周围环境的脂肪库的定量是评估与多种疾病相关的健康风险因素的准确程序。但是,由于人为的工作量,这种类型的评估并未在临床实践中广泛使用。这项工作提出了一种用于自动分割心脏脂肪垫的新技术。该技术基于将分类算法应用于心脏CT图像的分割。此外,我们广泛评估了几种算法在此任务上的性能,并讨论了提供了更好的预测模型。实验结果表明,心外膜和纵隔脂肪分类的平均准确性为98.4%,平均正面速率为96.2%。平均而言,关于分割的患者和地面真相的骰子相似性指数等于96.8%。因此,迄今为止,我们的技术已经获得了心脏脂肪自动分割的最准确结果。
translated by 谷歌翻译
TMIC是一种应用程序发明家扩展,用于部署ML模型,以在教育环境中使用Google Tochable Machine开发的图像分类。 Google Thotable Machine是一种直观的视觉工具,可为开发用于图像分类的ML模型提供面向工作流的支持。针对使用Google Tochable Machine开发的模型的使用,扩展TMIC可以作为App Inventor的一部分,以tensorflow.js为tensorflow.js导出的受过训练的模型,这是最受欢迎的基于块的编程环境之一,用于教学计算计算K-12。该扩展名是使用基于扩展图片的App Inventor扩展框架创建的,可在BSD 3许可下获得。它可用于在K-12中,在高等教育的入门课程中或有兴趣创建具有图像分类的智能应用程序的任何人。扩展TMIC是由Initiative Computa \ c {C} \ 〜Ao Na Escola的信息学和统计系的圣卡塔纳纳大学/巴西大学提供的研究工作的一部分,旨在在K-中引入AI教育。 12。
translated by 谷歌翻译
通过离散采样观测来建模连续的动力系统是数据科学中的一个基本问题。通常,这种动力学是非本地过程随时间不可或缺的结果。因此,这些系统是用插差分化方程(IDE)建模的;构成积分和差分组件的微分方程的概括。例如,大脑动力学不是通过微分方程来准确模拟的,因为它们的行为是非马克维亚的,即动态是部分由历史决定的。在这里,我们介绍了神经IDE(NIDE),该框架使用神经网络建模IDE的普通和组成部分。我们在几个玩具和大脑活动数据集上测试NIDE,并证明NIDE的表现优于其他模型,包括神经ODE。这些任务包括时间外推,以及从看不见的初始条件中预测动态,我们在自由行为的小鼠中测试了全皮质活动记录。此外,我们表明,NIDE可以通过学识渊博的整体操作员将动力学分解为马尔可夫和非马克维亚成分,我们在氯胺酮的fMRI脑活动记录中测试了动力学。最后,整体操作员的整体提供了一个潜在空间,可深入了解潜在的动态,我们在宽阔的大脑成像记录上证明了这一点。总体而言,NIDE是一种新颖的方法,可以通过神经网络对复杂的非本地动力学进行建模。
translated by 谷歌翻译
隔离架构在语音分离中显示出非常好的结果。像其他学习的编码器模型一样,它使用了短帧,因为它们已被证明在这些情况下可以获得更好的性能。这导致输入处有大量帧,这是有问题的。由于隔离器是基于变压器的,因此其计算复杂性随着较长的序列而大大增加。在本文中,我们在语音增强任务中采用了隔离器,并表明,通过以短期傅立叶变换(STFT)表示替换学习式编码器的功能,我们可以使用长帧而不会损害感知增强性能。我们获得了同等的质量和清晰度评估得分,同时将10秒的话语减少了大约8倍。
translated by 谷歌翻译
本文提出了一种新的方法,该方法结合了卷积层(CLS)和大规模的度量度量,用于在小数据集上进行培训模型以进行纹理分类。这种方法的核心是损失函数,该函数计算了感兴趣的实例和支持向量之间的距离。目的是在迭代中更新CLS的权重,以学习一类之间具有较大利润的表示形式。每次迭代都会产生一个基于这种表示形式的支持向量表示的大细边缘判别模型。拟议方法的优势W.R.T.卷积神经网络(CNN)为两倍。首先,由于参数数量减少,与等效的CNN相比,它允许用少量数据进行表示。其次,自返回传播仅考虑支持向量以来,它的培训成本较低。关于纹理和组织病理学图像数据集的实验结果表明,与等效的CNN相比,所提出的方法以较低的计算成本和更快的收敛性达到了竞争精度。
translated by 谷歌翻译